Anomaly Detecting Within Dynamic Chinese Chat Text
نویسندگان
چکیده
The problem in processing Chinese chat text originates from the anomalous characteristics and dynamic nature of such a text genre. That is, it uses ill-edited terms and anomalous writing styles in chat text, and the anomaly is created and discarded very quickly. To handle this problem, one solution is to re-train the recognizer periodically. This costs a lot of manpower in producing the timely chat text corpus. The new approaches are proposed in this paper to detect the anomaly within dynamic Chinese chat text by incorporating standard Chinese corpora and chat corpus. We first model standard language text using standard Chinese corpora and apply these models to detect anomalous chat text. To improve detection quality, we construct anomalous chat language model using one static chat text corpus and incorporate this model into the standard language models. Our approaches calculate confidence and entropy for the input text and apply threshold values to help make the decisions. The experiments prove that performance equivalent to the best ones produced by the approaches in existence can be achieved stably with our approaches.
منابع مشابه
A Phonetic-Based Approach to Chinese Chat Text Normalization
Chatting is a popular communication media on the Internet via ICQ, chat rooms, etc. Chat language is different from natural language due to its anomalous and dynamic natures, which renders conventional NLP tools inapplicable. The dynamic problem is enormously troublesome because it makes static chat language corpus outdated quickly in representing contemporary chat language. To address the dyna...
متن کاملIntrospective Study of Emotion Icon in Public Chat as a Gesture of Texting
An emotion icon, better known as emoticon is a metacommunicative pictorial representation of a facial expression that, in the absence of body language and prosody, serves to draw a receiver's attention to the tenor or temper of a sender's nominal verbal communication, changing and improving its interpretation. The present study investigates the use of these nonverbal cues in whatsapp public cha...
متن کاملConstructing A Chinese Chat Language Corpus with A Two-Stage Incremental Annotation Approach
Chat language refers to the special human language widely used in the community of digital network chat. As chat language holds anomalous characteristics in forming words, phrases, and non-alphabetical characters, conventional natural language processing tools are ineffective to handle chat language text. Previous research shows that knowledge based methods perform less effectively in processin...
متن کاملConversation Extraction in Dynamic Text Message Stream
Text message stream which is produced by Instant Messager and Internet Relay Chat poses interesting and challenging problems for information technologies. It is beneficial to extract the conversations in this kind of chatting message stream for information management and knowledge finding. However, the data in text message stream are usually very short and incomplete, and it requires efficiency...
متن کاملChinese Novelty Mining
Automated mining of novel documents or sentences from chronologically ordered documents or sentences is an open challenge in text mining. In this paper, we describe the preprocessing techniques for detecting novel Chinese text and discuss the influence of different Part of Speech (POS) filtering rules on the detection performance. Experimental results on APWSJ and TREC 2004 Novelty Track data s...
متن کامل